Optimal Iteration Scheduling for Intra- and Inter- Tile Reuse in Nested Loop Accelerators

نویسندگان

  • Maurice Peemen
  • Bart Mesman
  • Henk Corporaal
چکیده

High Level Synthesis tools have reduced accelerator design time. However, a complex scaling problem that remains is the data transfer bottleneck. Accelerators require huge amounts of data and are often limited by interconnect resources. Local buffers can reduce communication by exploiting data reuse, but the data access order has a substantial impact on the amount of reuse that can be utilized. With loop transformations such as interchange and tiling the data access order can be modified. However, for real applications the design space is huge, finding the best set of transformations is often intractable. Therefore, we present a new methodology that minimizes the data transfer by loop interchange and tiling. In contrast to other methods we take inter-tile reuse and loop bounds into account. For real-world applications we show buffer size trade-offs that can give speedups up to 14x, alternatively these can reduce the required FPGA resources substantially.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Affine Transformations for Communication Minimized Parallelization and Locality Optimization of Arbitrarily Nested Loop Sequences

A long running program often spends most of its time in nested loops. The polyhedral model provides powerful abstractions to optimize loop nests with regular accesses for parallel execution. Affine transformations in this model capture a complex sequence of execution-reordering loop transformations that improve performance by parallelization as well as better locality. Although a significant am...

متن کامل

Affine Transformations for Communication Minimal Parallelization and Locality Optimization of Arbitrarily Nested Loop Sequences

A long running program often spends most of its time in nested loops. The polyhedral model provides powerful abstractions to optimize loop nests with regular accesses for parallel execution. Affine transformations in this model capture a complex sequence of execution-reordering loop transformations that improve performance by parallelization as well as better locality. Although a significant am...

متن کامل

Combining Performance Aspects of Irregular Gauss-Seidel Via Sparse Tiling

Finite Element problems are often solved using multigrid techniques. The most time consuming part of multigrid is the iterative smoother, such as Gauss-Seidel. To improve performance, iterative smoothers can exploit parallelism, intra-iteration data reuse, and inter-iteration data reuse. Current methods for parallelizing Gauss-Seidel on irregular grids, such as multi-coloring and ownercomputes ...

متن کامل

Optimal Software Pipelining of Nested Loops

This paper presents an approach to software pipelining of nested loops. While several papers have addressed software pipelining of single (non-nested) loops, little work has been done in the area of applying it to nested loops. This paper solves the problem of nding the minimum iteration initiation interval (in the absence of resource constraints) for each level of a nested loop. The problem is...

متن کامل

Optimal task scheduling at run time to exploit intra-tile parallelism

In this paper we address the issue of iteration space tiling to minimize the completion time of loops when executed on multicomputers. The previous work on tiling assumes atomic execution of tiles to minimize synchronization costs. In this work, we remove the restriction of atomicity of tiles so that internal parallelism within tiles is exploited by overlapping computation with communication on...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014